<!DOCTYPE html>
<HTML lang = "en">
<HEAD>
  <meta charset="UTF-8"/>
  <meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes">
  <title>Solving Stiff Ordinary Differential Equations</title>
  

  <script type="text/x-mathjax-config">
    MathJax.Hub.Config({
      tex2jax: {inlineMath: [['$','$'], ['\\(','\\)']]},
      TeX: { equationNumbers: { autoNumber: "AMS" } }
    });
  </script>

  <script type="text/javascript" async src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
  </script>

  
<style>
pre.hljl {
    border: 1px solid #ccc;
    margin: 5px;
    padding: 5px;
    overflow-x: auto;
    color: rgb(68,68,68); background-color: rgb(251,251,251); }
pre.hljl > span.hljl-t { }
pre.hljl > span.hljl-w { }
pre.hljl > span.hljl-e { }
pre.hljl > span.hljl-eB { }
pre.hljl > span.hljl-o { }
pre.hljl > span.hljl-k { color: rgb(148,91,176); font-weight: bold; }
pre.hljl > span.hljl-kc { color: rgb(59,151,46); font-style: italic; }
pre.hljl > span.hljl-kd { color: rgb(214,102,97); font-style: italic; }
pre.hljl > span.hljl-kn { color: rgb(148,91,176); font-weight: bold; }
pre.hljl > span.hljl-kp { color: rgb(148,91,176); font-weight: bold; }
pre.hljl > span.hljl-kr { color: rgb(148,91,176); font-weight: bold; }
pre.hljl > span.hljl-kt { color: rgb(148,91,176); font-weight: bold; }
pre.hljl > span.hljl-n { }
pre.hljl > span.hljl-na { }
pre.hljl > span.hljl-nb { }
pre.hljl > span.hljl-nbp { }
pre.hljl > span.hljl-nc { }
pre.hljl > span.hljl-ncB { }
pre.hljl > span.hljl-nd { color: rgb(214,102,97); }
pre.hljl > span.hljl-ne { }
pre.hljl > span.hljl-neB { }
pre.hljl > span.hljl-nf { color: rgb(66,102,213); }
pre.hljl > span.hljl-nfm { color: rgb(66,102,213); }
pre.hljl > span.hljl-np { }
pre.hljl > span.hljl-nl { }
pre.hljl > span.hljl-nn { }
pre.hljl > span.hljl-no { }
pre.hljl > span.hljl-nt { }
pre.hljl > span.hljl-nv { }
pre.hljl > span.hljl-nvc { }
pre.hljl > span.hljl-nvg { }
pre.hljl > span.hljl-nvi { }
pre.hljl > span.hljl-nvm { }
pre.hljl > span.hljl-l { }
pre.hljl > span.hljl-ld { color: rgb(148,91,176); font-style: italic; }
pre.hljl > span.hljl-s { color: rgb(201,61,57); }
pre.hljl > span.hljl-sa { color: rgb(201,61,57); }
pre.hljl > span.hljl-sb { color: rgb(201,61,57); }
pre.hljl > span.hljl-sc { color: rgb(201,61,57); }
pre.hljl > span.hljl-sd { color: rgb(201,61,57); }
pre.hljl > span.hljl-sdB { color: rgb(201,61,57); }
pre.hljl > span.hljl-sdC { color: rgb(201,61,57); }
pre.hljl > span.hljl-se { color: rgb(59,151,46); }
pre.hljl > span.hljl-sh { color: rgb(201,61,57); }
pre.hljl > span.hljl-si { }
pre.hljl > span.hljl-so { color: rgb(201,61,57); }
pre.hljl > span.hljl-sr { color: rgb(201,61,57); }
pre.hljl > span.hljl-ss { color: rgb(201,61,57); }
pre.hljl > span.hljl-ssB { color: rgb(201,61,57); }
pre.hljl > span.hljl-nB { color: rgb(59,151,46); }
pre.hljl > span.hljl-nbB { color: rgb(59,151,46); }
pre.hljl > span.hljl-nfB { color: rgb(59,151,46); }
pre.hljl > span.hljl-nh { color: rgb(59,151,46); }
pre.hljl > span.hljl-ni { color: rgb(59,151,46); }
pre.hljl > span.hljl-nil { color: rgb(59,151,46); }
pre.hljl > span.hljl-noB { color: rgb(59,151,46); }
pre.hljl > span.hljl-oB { color: rgb(102,102,102); font-weight: bold; }
pre.hljl > span.hljl-ow { color: rgb(102,102,102); font-weight: bold; }
pre.hljl > span.hljl-p { }
pre.hljl > span.hljl-c { color: rgb(153,153,119); font-style: italic; }
pre.hljl > span.hljl-ch { color: rgb(153,153,119); font-style: italic; }
pre.hljl > span.hljl-cm { color: rgb(153,153,119); font-style: italic; }
pre.hljl > span.hljl-cp { color: rgb(153,153,119); font-style: italic; }
pre.hljl > span.hljl-cpB { color: rgb(153,153,119); font-style: italic; }
pre.hljl > span.hljl-cs { color: rgb(153,153,119); font-style: italic; }
pre.hljl > span.hljl-csB { color: rgb(153,153,119); font-style: italic; }
pre.hljl > span.hljl-g { }
pre.hljl > span.hljl-gd { }
pre.hljl > span.hljl-ge { }
pre.hljl > span.hljl-geB { }
pre.hljl > span.hljl-gh { }
pre.hljl > span.hljl-gi { }
pre.hljl > span.hljl-go { }
pre.hljl > span.hljl-gp { }
pre.hljl > span.hljl-gs { }
pre.hljl > span.hljl-gsB { }
pre.hljl > span.hljl-gt { }
</style>



  <style type="text/css">
  @font-face {
  font-style: normal;
  font-weight: 300;
}
@font-face {
  font-style: normal;
  font-weight: 400;
}
@font-face {
  font-style: normal;
  font-weight: 600;
}
html {
  font-family: sans-serif; /* 1 */
  -ms-text-size-adjust: 100%; /* 2 */
  -webkit-text-size-adjust: 100%; /* 2 */
}
body {
  margin: 0;
}
article,
aside,
details,
figcaption,
figure,
footer,
header,
hgroup,
main,
menu,
nav,
section,
summary {
  display: block;
}
audio,
canvas,
progress,
video {
  display: inline-block; /* 1 */
  vertical-align: baseline; /* 2 */
}
audio:not([controls]) {
  display: none;
  height: 0;
}
[hidden],
template {
  display: none;
}
a:active,
a:hover {
  outline: 0;
}
abbr[title] {
  border-bottom: 1px dotted;
}
b,
strong {
  font-weight: bold;
}
dfn {
  font-style: italic;
}
h1 {
  font-size: 2em;
  margin: 0.67em 0;
}
mark {
  background: #ff0;
  color: #000;
}
small {
  font-size: 80%;
}
sub,
sup {
  font-size: 75%;
  line-height: 0;
  position: relative;
  vertical-align: baseline;
}
sup {
  top: -0.5em;
}
sub {
  bottom: -0.25em;
}
img {
  border: 0;
}
svg:not(:root) {
  overflow: hidden;
}
figure {
  margin: 1em 40px;
}
hr {
  -moz-box-sizing: content-box;
  box-sizing: content-box;
  height: 0;
}
pre {
  overflow: auto;
}
code,
kbd,
pre,
samp {
  font-family: monospace, monospace;
  font-size: 1em;
}
button,
input,
optgroup,
select,
textarea {
  color: inherit; /* 1 */
  font: inherit; /* 2 */
  margin: 0; /* 3 */
}
button {
  overflow: visible;
}
button,
select {
  text-transform: none;
}
button,
html input[type="button"], /* 1 */
input[type="reset"],
input[type="submit"] {
  -webkit-appearance: button; /* 2 */
  cursor: pointer; /* 3 */
}
button[disabled],
html input[disabled] {
  cursor: default;
}
button::-moz-focus-inner,
input::-moz-focus-inner {
  border: 0;
  padding: 0;
}
input {
  line-height: normal;
}
input[type="checkbox"],
input[type="radio"] {
  box-sizing: border-box; /* 1 */
  padding: 0; /* 2 */
}
input[type="number"]::-webkit-inner-spin-button,
input[type="number"]::-webkit-outer-spin-button {
  height: auto;
}
input[type="search"] {
  -webkit-appearance: textfield; /* 1 */
  -moz-box-sizing: content-box;
  -webkit-box-sizing: content-box; /* 2 */
  box-sizing: content-box;
}
input[type="search"]::-webkit-search-cancel-button,
input[type="search"]::-webkit-search-decoration {
  -webkit-appearance: none;
}
fieldset {
  border: 1px solid #c0c0c0;
  margin: 0 2px;
  padding: 0.35em 0.625em 0.75em;
}
legend {
  border: 0; /* 1 */
  padding: 0; /* 2 */
}
textarea {
  overflow: auto;
}
optgroup {
  font-weight: bold;
}
table {
  font-family: monospace, monospace;
  font-size : 0.8em;
  border-collapse: collapse;
  border-spacing: 0;
}
td,
th {
  padding: 0;
}
thead th {
    border-bottom: 1px solid black;
    background-color: white;
}
tr:nth-child(odd){
  background-color: rgb(248,248,248);
}


/*
* Skeleton V2.0.4
* Copyright 2014, Dave Gamache
* www.getskeleton.com
* Free to use under the MIT license.
* http://www.opensource.org/licenses/mit-license.php
* 12/29/2014
*/
.container {
  position: relative;
  width: 100%;
  max-width: 960px;
  margin: 0 auto;
  padding: 0 20px;
  box-sizing: border-box; }
.column,
.columns {
  width: 100%;
  float: left;
  box-sizing: border-box; }
@media (min-width: 400px) {
  .container {
    width: 85%;
    padding: 0; }
}
@media (min-width: 550px) {
  .container {
    width: 80%; }
  .column,
  .columns {
    margin-left: 4%; }
  .column:first-child,
  .columns:first-child {
    margin-left: 0; }

  .one.column,
  .one.columns                    { width: 4.66666666667%; }
  .two.columns                    { width: 13.3333333333%; }
  .three.columns                  { width: 22%;            }
  .four.columns                   { width: 30.6666666667%; }
  .five.columns                   { width: 39.3333333333%; }
  .six.columns                    { width: 48%;            }
  .seven.columns                  { width: 56.6666666667%; }
  .eight.columns                  { width: 65.3333333333%; }
  .nine.columns                   { width: 74.0%;          }
  .ten.columns                    { width: 82.6666666667%; }
  .eleven.columns                 { width: 91.3333333333%; }
  .twelve.columns                 { width: 100%; margin-left: 0; }

  .one-third.column               { width: 30.6666666667%; }
  .two-thirds.column              { width: 65.3333333333%; }

  .one-half.column                { width: 48%; }

  /* Offsets */
  .offset-by-one.column,
  .offset-by-one.columns          { margin-left: 8.66666666667%; }
  .offset-by-two.column,
  .offset-by-two.columns          { margin-left: 17.3333333333%; }
  .offset-by-three.column,
  .offset-by-three.columns        { margin-left: 26%;            }
  .offset-by-four.column,
  .offset-by-four.columns         { margin-left: 34.6666666667%; }
  .offset-by-five.column,
  .offset-by-five.columns         { margin-left: 43.3333333333%; }
  .offset-by-six.column,
  .offset-by-six.columns          { margin-left: 52%;            }
  .offset-by-seven.column,
  .offset-by-seven.columns        { margin-left: 60.6666666667%; }
  .offset-by-eight.column,
  .offset-by-eight.columns        { margin-left: 69.3333333333%; }
  .offset-by-nine.column,
  .offset-by-nine.columns         { margin-left: 78.0%;          }
  .offset-by-ten.column,
  .offset-by-ten.columns          { margin-left: 86.6666666667%; }
  .offset-by-eleven.column,
  .offset-by-eleven.columns       { margin-left: 95.3333333333%; }

  .offset-by-one-third.column,
  .offset-by-one-third.columns    { margin-left: 34.6666666667%; }
  .offset-by-two-thirds.column,
  .offset-by-two-thirds.columns   { margin-left: 69.3333333333%; }

  .offset-by-one-half.column,
  .offset-by-one-half.columns     { margin-left: 52%; }

}
html {
  font-size: 62.5%; }
body {
  font-size: 1.5em; /* currently ems cause chrome bug misinterpreting rems on body element */
  line-height: 1.6;
  font-weight: 400;
  font-family: "Raleway", "HelveticaNeue", "Helvetica Neue", Helvetica, Arial, sans-serif;
  color: #222; }
h1, h2, h3, h4, h5, h6 {
  margin-top: 0;
  margin-bottom: 2rem;
  font-weight: 300; }
h1 { font-size: 3.6rem; line-height: 1.2;  letter-spacing: -.1rem;}
h2 { font-size: 3.4rem; line-height: 1.25; letter-spacing: -.1rem; }
h3 { font-size: 3.2rem; line-height: 1.3;  letter-spacing: -.1rem; }
h4 { font-size: 2.8rem; line-height: 1.35; letter-spacing: -.08rem; }
h5 { font-size: 2.4rem; line-height: 1.5;  letter-spacing: -.05rem; }
h6 { font-size: 1.5rem; line-height: 1.6;  letter-spacing: 0; }

p {
  margin-top: 0; }
a {
  color: #1EAEDB; }
a:hover {
  color: #0FA0CE; }
.button,
button,
input[type="submit"],
input[type="reset"],
input[type="button"] {
  display: inline-block;
  height: 38px;
  padding: 0 30px;
  color: #555;
  text-align: center;
  font-size: 11px;
  font-weight: 600;
  line-height: 38px;
  letter-spacing: .1rem;
  text-transform: uppercase;
  text-decoration: none;
  white-space: nowrap;
  background-color: transparent;
  border-radius: 4px;
  border: 1px solid #bbb;
  cursor: pointer;
  box-sizing: border-box; }
.button:hover,
button:hover,
input[type="submit"]:hover,
input[type="reset"]:hover,
input[type="button"]:hover,
.button:focus,
button:focus,
input[type="submit"]:focus,
input[type="reset"]:focus,
input[type="button"]:focus {
  color: #333;
  border-color: #888;
  outline: 0; }
.button.button-primary,
button.button-primary,
input[type="submit"].button-primary,
input[type="reset"].button-primary,
input[type="button"].button-primary {
  color: #FFF;
  background-color: #33C3F0;
  border-color: #33C3F0; }
.button.button-primary:hover,
button.button-primary:hover,
input[type="submit"].button-primary:hover,
input[type="reset"].button-primary:hover,
input[type="button"].button-primary:hover,
.button.button-primary:focus,
button.button-primary:focus,
input[type="submit"].button-primary:focus,
input[type="reset"].button-primary:focus,
input[type="button"].button-primary:focus {
  color: #FFF;
  background-color: #1EAEDB;
  border-color: #1EAEDB; }
input[type="email"],
input[type="number"],
input[type="search"],
input[type="text"],
input[type="tel"],
input[type="url"],
input[type="password"],
textarea,
select {
  height: 38px;
  padding: 6px 10px; /* The 6px vertically centers text on FF, ignored by Webkit */
  background-color: #fff;
  border: 1px solid #D1D1D1;
  border-radius: 4px;
  box-shadow: none;
  box-sizing: border-box; }
/* Removes awkward default styles on some inputs for iOS */
input[type="email"],
input[type="number"],
input[type="search"],
input[type="text"],
input[type="tel"],
input[type="url"],
input[type="password"],
textarea {
  -webkit-appearance: none;
     -moz-appearance: none;
          appearance: none; }
textarea {
  min-height: 65px;
  padding-top: 6px;
  padding-bottom: 6px; }
input[type="email"]:focus,
input[type="number"]:focus,
input[type="search"]:focus,
input[type="text"]:focus,
input[type="tel"]:focus,
input[type="url"]:focus,
input[type="password"]:focus,
textarea:focus,
select:focus {
  border: 1px solid #33C3F0;
  outline: 0; }
label,
legend {
  display: block;
  margin-bottom: .5rem;
  font-weight: 600; }
fieldset {
  padding: 0;
  border-width: 0; }
input[type="checkbox"],
input[type="radio"] {
  display: inline; }
label > .label-body {
  display: inline-block;
  margin-left: .5rem;
  font-weight: normal; }
ul {
  list-style: circle; }
ol {
  list-style: decimal; }
ul ul,
ul ol,
ol ol,
ol ul {
  margin: 1.5rem 0 1.5rem 3rem;
  font-size: 90%; }
li > p {margin : 0;}
th,
td {
  padding: 12px 15px;
  text-align: left;
  border-bottom: 1px solid #E1E1E1; }
th:first-child,
td:first-child {
  padding-left: 0; }
th:last-child,
td:last-child {
  padding-right: 0; }
button,
.button {
  margin-bottom: 1rem; }
input,
textarea,
select,
fieldset {
  margin-bottom: 1.5rem; }
pre,
blockquote,
dl,
figure,
table,
p,
ul,
ol,
form {
  margin-bottom: 1.0rem; }
.u-full-width {
  width: 100%;
  box-sizing: border-box; }
.u-max-full-width {
  max-width: 100%;
  box-sizing: border-box; }
.u-pull-right {
  float: right; }
.u-pull-left {
  float: left; }
hr {
  margin-top: 3rem;
  margin-bottom: 3.5rem;
  border-width: 0;
  border-top: 1px solid #E1E1E1; }
.container:after,
.row:after,
.u-cf {
  content: "";
  display: table;
  clear: both; }

pre {
  display: block;
  padding: 9.5px;
  margin: 0 0 10px;
  font-size: 13px;
  line-height: 1.42857143;
  word-break: break-all;
  word-wrap: break-word;
  border: 1px solid #ccc;
  border-radius: 4px;
}

pre.hljl {
  margin: 0 0 10px;
  display: block;
  background: #f5f5f5;
  border-radius: 4px;
  padding : 5px;
}

pre.output {
  background: #ffffff;
}

pre.code {
  background: #ffffff;
}

pre.julia-error {
  color : red
}

code,
kbd,
pre,
samp {
  font-family: Menlo, Monaco, Consolas, "Courier New", monospace;
  font-size: 0.9em;
}


@media (min-width: 400px) {}
@media (min-width: 550px) {}
@media (min-width: 750px) {}
@media (min-width: 1000px) {}
@media (min-width: 1200px) {}

h1.title {margin-top : 20px}
img {max-width : 100%}
div.title {text-align: center;}

  </style>
</HEAD>

<BODY>
  <div class ="container">
    <div class = "row">
      <div class = "col-md-12 twelve columns">
        <div class="title">
          <h1 class="title">Solving Stiff Ordinary Differential Equations</h1>
          <h5>Chris Rackauckas</h5>
          <h5>October 14th, 2020</h5>
        </div>

        <p>We have previously shown how to solve non-stiff ODEs via optimized Runge-Kutta methods, but we ended by showing that there is a fundamental limitation of these methods when attempting to solve stiff ordinary differential equations. However, we can get around these limitations by using different types of methods, like implicit Euler. Let&#39;s now go down the path of understanding how to efficiently implement stiff ordinary differential equation solvers, and its interaction with other domains like automatic differentiation.</p>
<p>When one is solving a large-scale scientific computing problem with MPI, this is almost always the piece of code where all of the time is spent, so let&#39;s understand how what it&#39;s doing.</p>
<h2>Newton&#39;s Method and Jacobians</h2>
<p>Recall that the implicit Euler method is the following:</p>
<p class="math">\[
u_{n+1} = u_n + \Delta t f(u_{n+1},p,t + \Delta t)
\]</p>
<p>If we wanted to use this method, we would need to find out how to get the value <span class="math">$u_{n+1}$</span> when only knowing the value <span class="math">$u_n$</span>. To do so, we can move everything to one side:</p>
<p class="math">\[
u_{n+1} - \Delta t f(u_{n+1},p,t + \Delta t) - u_n = 0
\]</p>
<p>and now we have a problem</p>
<p class="math">\[
g(u_{n+1}) = 0
\]</p>
<p>This is the classic rootfinding problem <span class="math">$g(x)=0$</span>, find <span class="math">$x$</span>. The way that we solve the rootfinding problem is, once again, by replacing this problem about a continuous function <span class="math">$g$</span> with a discrete dynamical system whose steady state is the solution to the <span class="math">$g(x)=0$</span>. There are many methods for this, but some choices of the rootfinding method effect the stability of the ODE solver itself since we need to make sure that the steady state solution is a stable steady state of the iteration process, otherwise the rootfinding method will diverge &#40;will be explored in the homework&#41;.</p>
<p>Thus for example, fixed point iteration is not appropriate for stiff differential equations. Methods which are used in the stiff case are either Anderson Acceleration or Newton&#39;s method. Newton&#39;s is by far the most common &#40;and generally performs the best&#41;, so we can go down this route.</p>
<p>Let&#39;s use the syntax <span class="math">$g(x)=0$</span>. Here we need some starting value <span class="math">$x_0$</span> as our first guess for <span class="math">$u_{n+1}$</span>. The easiest guess is <span class="math">$u_{n}$</span>, though additional information about the equation can be used to compute a better starting value &#40;known as a <em>step predictor</em>&#41;. Once we have a starting value, we run the iteration:</p>
<p class="math">\[
x_{k+1} = x_k - J(x_k)^{-1}g(x_k)
\]</p>
<p>where <span class="math">$J(x_k)$</span> is the Jacobian of <span class="math">$g$</span> at the point <span class="math">$x_k$</span>. However, the mathematical formulation is never the syntax that you should use for the actual application&#33; Instead, numerically this is two stages:</p>
<ul>
<li><p>Solve <span class="math">$Ja=g(x_k)$</span> for <span class="math">$a$</span></p>
</li>
<li><p>Update <span class="math">$x_{k+1} = x_k - a$</span></p>
</li>
</ul>
<p>By doing this, we can turn the matrix inversion into a problem of a linear solve and then an update. The reason this is done is manyfold, but one major reason is because the inverse of a sparse matrix can be dense, and this Jacobian is in many cases &#40;PDEs&#41; a large and dense matrix.</p>
<p>Now let&#39;s break this down step by step.</p>
<h3>Some Quick Notes</h3>
<p>The Jacobian of <span class="math">$g$</span> can also be written as <span class="math">$J = I - \gamma \frac{df}{du}$</span> for the ODE <span class="math">$u' = f(u,p,t)$</span>, where <span class="math">$\gamma = \Delta t$</span> for the implicit Euler method. This general form holds for all other &#40;SDIRK&#41; implicit methods, changing the value of <span class="math">$\gamma$</span>. Additionally, the class of Rosenbrock methods solves a linear system with exactly the same <span class="math">$J$</span>, meaning that essentially all implicit and semi-implicit ODE solvers have to do the same Newton iteration process on the same structure. This is the portion of the code that is generally the bottleneck.</p>
<p>Additionally, if one is solving a mass matrix ODE: <span class="math">$Mu' = f(u,p,t)$</span>, exactly the same treatment can be had with <span class="math">$J = M - \gamma \frac{df}{du}$</span>. This works even if <span class="math">$M$</span> is singular, a case known as a <em>differential-algebraic equation</em> or a DAE. A DAE for example can be an ODE with constraint equations, and these structures can be represented as an ODE where these constraints lead to a singularity in the mass matrix &#40;a row of all zeros is a term that is only the right hand side equals zero&#33;&#41;.</p>
<h2>Generation of the Jacobian</h2>
<h3>Dense Finite Differences and Forward-Mode AD</h3>
<p>Recall that the Jacobian is the matrix of <span class="math">$\frac{df_i}{dx_j}$</span> for <span class="math">$f$</span> a vector-valued function. The simplest way to generate the Jacobian is through finite differences. For each <span class="math">$h_j = h e_j$</span> for <span class="math">$e_j$</span> the basis vector of the <span class="math">$j$</span>th axis and some sufficiently small <span class="math">$h$</span>, then we can compute column <span class="math">$j$</span> of the Jacobian by:</p>
<p class="math">\[
\frac{f(x+h_j)-f(x)}{h}
\]</p>
<p>Thus <span class="math">$m+1$</span> applications of <span class="math">$f$</span> are required to compute the full Jacobian.</p>
<p>This can be improved by using forward-mode automatic differentiation. Recall that we can formulate a multidimensional duel number of the form</p>
<p class="math">\[
d = x + v_1 \epsilon_1 + \ldots + v_m \epsilon_m
\]</p>
<p>We can then seed the vectors <span class="math">$v_j = h_j$</span> so that the differentiation directions are along the basis vectors, and then the output dual is the result:</p>
<p class="math">\[
f(d) = f(x) + J_1 \epsilon_1 + \ldots + J_m \epsilon_m
\]</p>
<p>where <span class="math">$J_j$</span> is the <span class="math">$j$</span>th column of the Jacobian. And thus with one calculation of the <em>primal</em> &#40;f&#40;x&#41;&#41; we have calculated the entire Jacobian.</p>
<h3>Sparse Differentiation and Matrix Coloring</h3>
<p>However, when the Jacobian is sparse we can compute it much faster. We can understand this by looking at the following system:</p>
<p class="math">\[
f(x)=\left[\begin{array}{c}
x_{1}+x_{3}\\
x_{2}x_{3}\\
x_{1}
\end{array}\right]
\]</p>
<p>Notice that in 3 differencing steps we can calculate:</p>
<p class="math">\[
f(x+\epsilon e_{1})=\left[\begin{array}{c}
x_{1}+x_{3}+\epsilon\\
x_{2}x_{3}\\
x_{1}+\epsilon
\end{array}\right]
\]</p>
<p class="math">\[
f(x+\epsilon e_{2})=\left[\begin{array}{c}
x_{1}+x_{3}\\
x_{2}x_{3}+\epsilon x_{3}\\
x_{1}
\end{array}\right]
\]</p>
<p class="math">\[
f(x+\epsilon e_{3})=\left[\begin{array}{c}
x_{1}+x_{3}+\epsilon\\
x_{2}x_{3}+\epsilon x_{2}\\
x_{1}
\end{array}\right]
\]</p>
<p>and thus:</p>
<p class="math">\[
\frac{f(x+\epsilon e_{1})-f(x)}{\epsilon}=\left[\begin{array}{c}
1\\
0\\
1
\end{array}\right]
\]</p>
<p class="math">\[
\frac{f(x+\epsilon e_{2})-f(x)}{\epsilon}=\left[\begin{array}{c}
0\\
x_{3}\\
0
\end{array}\right]
\]</p>
<p class="math">\[
\frac{f(x+\epsilon e_{3})-f(x)}{\epsilon}=\left[\begin{array}{c}
1\\
x_{2}\\
0
\end{array}\right]
\]</p>
<p>But notice that the calculation of <span class="math">$e_1$</span> and <span class="math">$e_2$</span> do not interact. If we had done:</p>
<p class="math">\[
\frac{f(x+\epsilon e_{1}+\epsilon e_{2})-f(x)}{\epsilon}=\left[\begin{array}{c}
1\\
x_{3}\\
1
\end{array}\right]
\]</p>
<p>we would still get the correct value for every row because the <span class="math">$\epsilon$</span> terms do not collide &#40;a situation known as <em>perturbation confusion</em>&#41;. If we knew the sparsity pattern of the Jacobian included a 0 at &#40;2,1&#41;, &#40;1,2&#41;, and &#40;3,2&#41;, then we would know that the vectors would have to be <span class="math">$[1 0 1]$</span> and <span class="math">$[0 x_3 0]$</span>, meaning that columns 1 and 2 can be computed simultaniously and decompressed. This is the key to sparse differentiation.</p>
<p><img src="https://user-images.githubusercontent.com/1814174/66027457-efd7cc00-e4c8-11e9-8346-accf468541fb.PNG" alt="" /></p>
<p>With forward-mode automatic differentiation, recall that we calculate multiple dimensions simultaniously by using a multidimensional dual number seeded by the vectors of the differentiation directions, that is:</p>
<p class="math">\[
d = x + v_1 \epsilon_1 + \ldots + v_m \epsilon_m
\]</p>
<p>Instead of using the primitive differentiation directions <span class="math">$e_j$</span>, we can instead replace this with the mixed values. For example, the Jacobian of the example function can be computed in one function call to <span class="math">$f$</span> with the dual number input:</p>
<p class="math">\[
d = x + (e_1 + e_2) \epsilon_1 + e_3 \epsilon_2
\]</p>
<p>and performing the decompression via the sparsity pattern. Thus the sparsity pattern gives a direct way to optimize the construction of the Jacobian.</p>
<p>This idea of independent directions can be formalized as a <em>matrix coloring</em>. Take <span class="math">$S_{ij}$</span> the sparsity pattern of some Jacobian matrix <span class="math">$J_{ij}$</span>. Define a graph on the nodes 1 through m where there is an edge between <span class="math">$i$</span> and <span class="math">$j$</span> if there is a row where <span class="math">$i$</span> and <span class="math">$j$</span> are non-zero. This graph is the column connectivity graph of the Jacobian. What we wish to do is find the smallest set of differentiation directions such that differentiating in the direction of <span class="math">$e_i$</span> does not collide with differentiation in the direction of <span class="math">$e_j$</span>. The connectivity graph is setup so that way this cannot be done if the two nodes are adjacent. If we let the subset of nodes differentiated together be a <em>color</em>, the question is, what is the smallest number of colors s.t. no adjacent nodes are the same color. This is the classic <em>distance-1 coloring problem</em> from graph theory. It is well-known that the problem of finding the <em>chromatic number</em>, the minimal number of colors for a graph, is generally NP-complete. However, there are heuristic methods for performing a distance-1 coloring quite quickly. For example, a greedy algorithm is as follows:</p>
<ul>
<li><p>Pick a node at random to be color 1.</p>
</li>
<li><p>Make all nodes adjacent to that be the lowest color that they can be &#40;in this step that will be 2&#41;.</p>
</li>
<li><p>Now look at all nodes adjacent to that. Make all nodes be the lowest color that they can be &#40;either 1 or 3&#41;.</p>
</li>
<li><p>Repeat by looking at the next set of adjacent nodes and color as conservatively as possible.</p>
</li>
</ul>
<p>This can be visualized as follows:</p>
<p><img src="https://user-images.githubusercontent.com/1814174/66027433-e189b000-e4c8-11e9-8c2e-3999954cda28.PNG" alt="" /></p>
<p>The result will color the entire connected component. While not giving an optimal result, it will still give a result that is a sufficient reduction in the number of differentiation directions &#40;without solving an NP-complete problem&#41; and thus can lead to a large computational saving.</p>
<p>At the end, let <span class="math">$c_i$</span> be the vector of 1&#39;s and 0&#39;s, where it&#39;s 1 for every node that is color <span class="math">$i$</span> and 0 otherwise. Sparse automatic differentiation of the Jacobian is then computed with:</p>
<p class="math">\[
d = x + c_1 \epsilon_1 + \ldots + c_k \epsilon_k
\]</p>
<p>that is, the full Jacobian is computed with one dual number which consists of the primal calculation along with <span class="math">$k$</span> dual dimensions, where <span class="math">$k$</span> is the computed chromatic number of the connectivity graph on the Jacobian. Once this calculation is complete, the colored columns can be decompressed into the full Jacobian using the sparsity information, generating the original quantity that we wanted to compute.</p>
<p>For more information on the graph coloring aspects, find the paper titled &quot;What Color Is Your Jacobian? Graph Coloring for Computing Derivatives&quot; by Gebremedhin.</p>
<h4>Note on Sparse Reverse-Mode AD</h4>
<p>Reverse-mode automatic differentiation can be though of as a method for computing one row of a Jacobian per seed, as opposed to one column per seed given by forward-mode AD. Thus sparse reverse-mode automatic differentiation can be done by looking at the connectivity graph of the column and using the resulting color vectors to seed the reverse accumulation process.</p>
<h2>Linear Solving</h2>
<p>After the Jacobian has been computed, we need to solve a linear equation <span class="math">$Ja=b$</span>. While mathematically you can solve this by computing the inverse <span class="math">$J^{-1}$</span>, this is not a good way to perform the calculation because even if <span class="math">$J$</span> is sparse, then <span class="math">$J^{-1}$</span> is in general dense and thus may not fit into memory &#40;remember, this is <span class="math">$N^2$</span> as many terms, where <span class="math">$N$</span> is the size of the ordinary differential equation that is being solved, so if it&#39;s a large equation it is very feasible and common that the ODE is representable but its full Jacobian is not able to fit into RAM&#41;. Note that some may say that this is done for numerical stability reasons: that is incorrect. In fact, under reasonable assumptions for how the inverse is computed, it will be as numerically stable as other techniques we will mention.</p>
<p>Thus instead of generating the inverse, we can instead perform a <em>matrix factorization</em>. A matrix factorization is a transformation of the matrix into a form that is more amenable to certain analyses. For our purposes, a general Jacobian within a Newton iteration can be transformed via the <em>LU-factorization</em> or &#40;<em>LU-decomposition</em>&#41;, i.e.</p>
<p class="math">\[
J = LU
\]</p>
<p>where <span class="math">$L$</span> is lower triangular and <span class="math">$U$</span> is upper traingular. If we write the linear equation in this form:</p>
<p class="math">\[
LUa = b
\]</p>
<p>then we see that we can solve it by first solving <span class="math">$L(Ua) = b$</span>. Since <span class="math">$L$</span> is lower triangular, this is done by the backsubstitution algorithm. That is, in a lower triangular form, we can solve for the first value since we have:</p>
<p class="math">\[
L_{11} a_1 = b_1
\]</p>
<p>and thus by dividing we solve. For the next term, we have that</p>
<p class="math">\[
L_{21} a_1 + L_{22} a_2 = b_2
\]</p>
<p>and thus we plug in the solution to <span class="math">$a_1$</span> and solve to get <span class="math">$a_2$</span>. The lower traingular form allows this to continue. This occurs in 1&#43;2&#43;3&#43;...&#43;n operations, and is thus O&#40;n^2&#41;. Next, we solve <span class="math">$Ua = b$</span>, which once again is done by a backsubstitution algorithm but in the reverse direction. Together those two operations are O&#40;n^2&#41; and complete the inversion of <span class="math">$LU$</span>.</p>
<p>So is this an O&#40;n^2&#41; algorithm for computing the solution of a linear system? No, because the computation of <span class="math">$LU$</span> itself is an O&#40;n^3&#41; calculation, and thus the true complexity of solving a linear system is still O&#40;n^3&#41;. However, if we have already factorized <span class="math">$J$</span>, then we can repeatedly use the same <span class="math">$LU$</span> factors to solve additional linear problems <span class="math">$Jv = u$</span> with different vectors. We can exploit this to accelerate the Newton method. Instead of doing the calculation:</p>
<p class="math">\[
x_{k+1} = x_k - J(x_k)^{-1}g(x_k)
\]</p>
<p>we can instead do:</p>
<p class="math">\[
x_{k+1} = x_k - J(x_0)^{-1}g(x_k)
\]</p>
<p>so that all of the Jacobians are the same. This means that a single O&#40;n^3&#41; factorization can be done, with multiple O&#40;n^2&#41; calculations using the same factorization. This is known as a Quasi-Newton method. While this makes the Newton method no longer quadratically convergent, it minimizes the large constant factor on the computational cost while retaining the same dynamical properties, i.e. the same steady state and thus the same overall solution. This makes sense for sufficiently large <span class="math">$n$</span>, but requires sufficiently large <span class="math">$n$</span> because the loss of quadratic convergence means that it will take more steps to converge than before, and thus more <span class="math">$O(n^2)$</span> backsolves are required, meaning that the difference between factorizations and backsolves needs to be large enough in order to offset the cost of extra steps.</p>
<h4>Note on Sparse Factorization</h4>
<p>Note that LU-factorization, and other factorizations, have generalizations to sparse matrices where a <em>symbolic factorization</em> is utilized to compute a sparse storage of the values which then allow for a fast backsubstitution. More details are outside the scope of this course, but note that Julia and MATLAB will both use the library SuiteSparse in the background when <code>lu</code> is called on a sparse matrix.</p>
<h2>Jacobian-Free Newton Krylov &#40;JFNK&#41;</h2>
<p>An alternative method for solving the linear system is the Jacobian-Free Newton Krylov technique. This technique is broken into two pieces: the <em>jvp</em> calculation and the Krylov subspace iterative linear solver.</p>
<h3>Jacobian-Vector Products as Directional Derivatives</h3>
<p>We don&#39;t actually need to compute <span class="math">$J$</span> itself, since all that we actually need is the <code>v &#61; J*w</code>. Is it possible to compute the <em>Jacobian-Vector Product</em>, or the jvp, without producing the Jacobian?</p>
<p>To see how this is done let&#39;s take a look at what is actually calculated. Written out in the standard basis, we have that:</p>
<p class="math">\[
w_i = \sum_{j}^{m} J_{ij} v_{j}
\]</p>
<p>Now write out what <span class="math">$J$</span> means and we see that:</p>
<p class="math">\[
w_i = \sum_j^{m} \frac{df_i}{dx_j} v_j = \nabla f_i(x) \cdot v
\]</p>
<p>that is, the <span class="math">$i$</span>th component of <span class="math">$Jv$</span> is the directional derivative of <span class="math">$f_i$</span> in the direction <span class="math">$v$</span>. This means that in general, the jvp <span class="math">$Jv$</span> is actually just the directional derivative in the direction of <span class="math">$v$</span>, that is:</p>
<p class="math">\[
Jv = \nabla f \cdot v
\]</p>
<p>and therefore it has another mathematical representation, that is:</p>
<p class="math">\[
Jv = \lim_{\epsilon \rightarrow 0} \frac{f(x+v \epsilon) - f(x)}{\epsilon}
\]</p>
<p>From this alternative form it is clear that <strong>we can always compute a jvp with a single computation</strong>. Using finite differences, a simple approximation is the following:</p>
<p class="math">\[
Jv \approx \frac{f(x+v \epsilon) - f(x)}{\epsilon}
\]</p>
<p>for non-zero <span class="math">$\epsilon$</span>. Similarly, recall that in forward-mode automatic differentiation we can choose directions by seeding the dual part. Therefore, using the dual number with one partial component:</p>
<p class="math">\[
d = x + v \epsilon
\]</p>
<p>we get that</p>
<p class="math">\[
f(d) = f(x) + Jv \epsilon
\]</p>
<p>and thus a single application with a single partial gives the jvp.</p>
<h4>Note on Reverse-Mode Automatic Differentiation</h4>
<p>As noted earlier, reverse-mode automatic differentiation has its primitives compute rows of the Jacobian in the seeded direction. This means that the seeded reverse-mode call with the vector <span class="math">$v$</span> computes <span class="math">$v^T J$</span>, that is the <em>vector &#40;transpose&#41; Jacobian transpose</em>, or <em>vjp</em> for short. When discussing parameter estimation and adjoints, this shorthand will be introduced as a way for using a traditionally machine learning tool to accelerate traditionally scientific computing tasks.</p>
<h3>Krylov Subspace Methods For Solving Linear Systems</h3>
<h4>Basic Iterative Solver Methods</h4>
<p>Now that we have direct access to quick calculations of <span class="math">$Jv$</span>, how would we use this to solve the linear system <span class="math">$Jw = v$</span> quickly? This is done through <em>iterative linear solvers</em>. These methods replace the process of solving for a factorization with, you may have guessed it, a discrete dynamical system whose solution is <span class="math">$w$</span>. To do this, what we want is some iterative process so that</p>
<p class="math">\[
Jw - b = 0
\]</p>
<p>So now let&#39;s split <span class="math">$J = A - B$</span>, then if we are iterating the vectors <span class="math">$w_k$</span> such that <span class="math">$w_k \rightarrow w$</span>, then if we plug this into the previous &#40;residual&#41; equation we get</p>
<p class="math">\[
A w_{k+1} = Bw_k + b
\]</p>
<p>since when we plug in <span class="math">$w$</span> we get zero &#40;the sequence must be Cauchy so the difference <span class="math">$w_{k+1} - w_k \rightarrow 0$</span>&#41;. Thus if we can split our matrix <span class="math">$J$</span> into a component <span class="math">$A$</span> which is easy to invert and a part <span class="math">$B$</span> that is just everything else, then we would have a bunch of easy linear systems to solve. There are many different choices that we can do. If we let <span class="math">$J = L + D + U$</span>, where <span class="math">$L$</span> is the lower portion of <span class="math">$J$</span>, <span class="math">$D$</span> is the diagonal, and <span class="math">$U$</span> is the upper portion, then the following are well-known methods:</p>
<ul>
<li><p>Richardson: <span class="math">$A = \omega I$</span> for some <span class="math">$\omega$</span></p>
</li>
<li><p>Jacobi: <span class="math">$A = D$</span></p>
</li>
<li><p>Damped Jacobi: <span class="math">$A = \omega D$</span></p>
</li>
<li><p>Gauss-Seidel: <span class="math">$A = D-L$</span></p>
</li>
<li><p>Successive Over Relaxation: <span class="math">$A = \omega D - L$</span></p>
</li>
<li><p>Symmetric Successive Over Relaxation: <span class="math">$A = \frac{1}{\omega (2 - \omega)}(D-\omega L)D^{-1}(D-\omega U)$</span></p>
</li>
</ul>
<p>These decompositions are chosen since a diagonal matrix is easy to invert &#40;it&#39;s just the inversion of the scalars of the diagonal&#41; and it&#39;s easy to solve an upper or lower traingular linear system &#40;once again, it&#39;s backsubstitution&#41;.</p>
<p>Since these methods give a a linear dynamical system, we know that there is a unique steady state solution, which happens to be <span class="math">$Aw - Bw = Jw = b$</span>. Thus we will converge to it as long as the steady state is stable. To see if it&#39;s stable, take the update equation</p>
<p class="math">\[
w_{k+1} = A^{-1}(Bw_k + b)
\]</p>
<p>and check the eigenvalues of the system: if they are within the unit circle then you have stability. Notice that this can always occur by bringing the eigenvalues of <span class="math">$A^{-1}$</span> closer to zero, which can be done by multiplying <span class="math">$A$</span> by a significantly large value, hence the <span class="math">$\omega$</span> quantities. While that always works, this essentially amounts to decreasing the stepsize of the iterative process and thus requiring more steps, thus making it take more computations. Thus the game is to pick the largest stepsize &#40;<span class="math">$\omega$</span>&#41; for which the steady state is stable. We will leave that as outside the topic of this course.</p>
<h4>Krylov Subspace Methods</h4>
<p>While the classical iterative solver methods give the background for understanding an alternative to direct inversion or factorization of a matrix, the problem with that approach is that it requires the ability to split the matrix <span class="math">$J$</span>, which we would like to avoid computing. Instead, we would like to develop an iterative solver technique which instead just uses the solution to <span class="math">$Jv$</span>. Indeed there are such methods, and these are the Krylov subspace methods. A Krylov subspace is the space spanned by:</p>
<p class="math">\[
\mathcal{K}_k = \text{span} \{v,Jv,J^2 v, \ldots, J^k v\}
\]</p>
<p>There are a few nice properties about Krylov subspaces that can be exploited. For one, it is known that there is a finite maximum dimension of the Krylov subspace, that is there is a value <span class="math">$r$</span> such that <span class="math">$J^{r+1} v \in \mathcal{K}_r$</span>, which means that the complete Krylov subspace can be computed in finitely many jvp, since <span class="math">$J^2 v$</span> is just the jvp where the vector is the jvp. Indeed, one can show that <span class="math">$J^i v$</span> is linearly independent for each <span class="math">$i$</span>, and thus that maximal value is <span class="math">$m$</span>, the dimension of the Jacobian. Therefore in <span class="math">$m$</span> jvps the solution is guerenteed to live in the Krylov subspace, giving a maximal computational cost and a proof of convergence if the vector in there is the &quot;optimal in the space&quot;.</p>
<p>The most common method in the Krylov subspace family of methods is the GMRES method. Essentially, in step <span class="math">$i$</span> one computes <span class="math">$\mathcal{K}_i$</span>, and finds the <span class="math">$x$</span> that is the closest to the Krylov subspace, i.e. finds the <span class="math">$x \in \mathcal{K}_i$</span> such that <span class="math">$\Vert Jx-v \Vert$</span> is minimized. At each step, it adds the new vector to the Krylov subspace after orthgonalizing it against the other vectors via Arnoldi iterations, leading to an orthoganol basis of <span class="math">$\mathcal{K}_i$</span> which makes it easy to express <span class="math">$x$</span>.</p>
<p>While one has a guerenteed bound on the number of possible jvps in GMRES which is simply the number of ODEs &#40;since that is what determines the size of the Jacobian and thus the total dimension of the problem&#41;, that bound is not necessarily a good one. For a large sparse matrix, it may be computationally impractical to ever compute 100,000 jvps. Thus one does not typically run the algorithm to conclusion, and instead stops when <span class="math">$\Vert Jx-v \Vert$</span> is sufficiently below some user-defined error tolerance.</p>
<h2>Intermediate Conclusion</h2>
<p>Let&#39;s take a step back and see what our intermediate conclusion is. In order to solve for the implicit step, it just boils down to doing Newton&#39;s method on some <span class="math">$g(x)=0$</span>. If the Jacobian is small enough, one factorizes the Jacobian and uses Quasi-Newton iterations in order to utilize the stored LU-decomposition in multiple steps to reduce the computation cost. If the Jacobian is sparse, sparse automatic differentiation through matrix coloring is employed to directly fill the sparse matrix with less applications of <span class="math">$g$</span>, and then this sparse matrix is factorized using a sparse LU factorization.</p>
<p>When the matrix is too large, then one resorts to using a Krylov subspace method, since this only requires being able to do <span class="math">$Jv$</span> calculations. In general, <span class="math">$Jv$</span> can be done matrix-free because it is simply the directional derivative in the direction of the vector <span class="math">$v$</span>, which can be computed thorugh either numerical or forward-mode automatic differentiation. This is then used in the GMRES iterative process to find the solution in the Krylov subspace which is closest to the solution, exiting early when the residual error is small enough. If this is converging too slow, then preconditioning is used.</p>
<p>That&#39;s the basic algorithm, but what are the other important details for getting this right?</p>
<h2>The Need for Speed</h2>
<h3>Preconditioning</h3>
<p>However, the speed at GMRES convergences is dependent on the correlations between the vectors, which can be shown to be related to the condition number of the Jacobian matrix. A high condition number makes convergence slower &#40;this is the case for the traditional iterative methods as well&#41;, which in turn is an issue because it is the high condition number on the Jacobian which leads to stiffness and causes one to have to use an implicit integrator in the first place&#33;</p>
<p>To help speed up the convergence, a common technique is known as <em>preconditioning</em>. Preconditioning is the process of using a semi-inverse to the matrix in order to split the matrix so that the iterative problem that is being solved is one that is has a smaller condition number. Mathematically, it involves decomposing <span class="math">$J = P_l A P_r$</span> where <span class="math">$P_l$</span> and <span class="math">$P_r$</span> are the left and right preconditioners which have simple inverses, and thus instead of solving <span class="math">$Jx=v$</span>, we would solve:</p>
<p class="math">\[
P_l A P_r x = v
\]</p>
<p>or</p>
<p class="math">\[
A P_r x = P_l^{-1}v
\]</p>
<p>which then means that the Krylov subpace that needs to be solved for is that defined by <span class="math">$A$</span>: <span class="math">$\mathcal{K} = \text{span}\{v,Av,A^2 v, \ldots\}$</span>. There are many possible choices for these preconditioners, but they are usually problem dependent. For example, for ODEs which come from parabolic and elliptic PDE discretizations, the <em>multigrid method</em>, such as a geometric multigrid or an algebraic multigrid, is a preconditioner that can accelerate the iterative solving process. One generic preconditioner that can generally be used is to divide by the norm of the vector <span class="math">$v$</span>, which is a scaling employed by both SUNDIALS CVODE and by DifferentialEquations.jl and can be shown to be almost always advantageous.</p>
<h3>Jacobian Re-use</h3>
<p>If the problem is small enough such that the factorization is used and a Quasi-Newton technique is employed, it then holds that for most steps <span class="math">$J$</span> is only approximate since it can be using an old LU-factorization. To push it even further, high performance codes allow for <em>jacobian reuse</em>, which is allowing the same Jacobian to be reused between different timesteps. If the Jacobian is too incorrect, it can cause the Newton iterations to diverge, which is then when one would calculate a new Jacobian and compute a new LU-factorization.</p>
<h3>Adaptive Timestepping</h3>
<p>In simple cases, like partial differential equation discretizations of physical problems, the resulting ODEs are not too stiff and thus Newton&#39;s iteration generally works. However, in cases like stiff biological models, Newton&#39;s iteration can itself not always be stable enough to allow convergence. In fact, with many of the stiff biological models commonly used in benchmarks, not method is stable enough to pass without using adaptive timestepping&#33; Thus one may need to adapt the timestep in order to improve the ability for the Newton method to converge &#40;smaller timesteps increase the stabiltiy of the Newton stepping, see the homework&#41;.</p>
<p>This needs to be mixed with the Jacobian re-use strategy, since <span class="math">$J = I - \gamma \frac{df}{du}$</span> where <span class="math">$\gamma$</span> is dependent on <span class="math">$\Delta t$</span> &#40;and <span class="math">$\gamma = \Delta t$</span> for implicit Euler&#41; means that the Jacobian of the Newton method changes as <span class="math">$\Delta t$</span> changes. Thus one usually has a tiered algorithm for determining when to update the factorizations of <span class="math">$J$</span> vs when to compute a new <span class="math">$\frac{df}{du}$</span> and then refactorize. This is generally dependent on estimates of convergence rates to heuristically guess how far off <span class="math">$\frac{df}{du}$</span> is from the current true value.</p>
<p>So how does one perform adaptivity? This is generally done through a rejection sampling technique. First one needs some estimate of the error in a step. This is calculated through an <em>embedded method</em>, which is a method that is able to be cacluated without any extra <span class="math">$f$</span> evaluations that is &#40;usually&#41; one order different from the true method. The difference between the true and the embedded method is then an error estimate. If this is greater than a user chosen tolerance, the step is rejected and re-ran with a smaller <span class="math">$\Delta t$</span> &#40;possibly refactorizing, etc.&#41;. If this is less than the user tolerance, the step is accepted and <span class="math">$\Delta t$</span> is changed.</p>
<p>There are many schemes for how one can change <span class="math">$\Delta t$</span>. One of the most common is known as the <em>P-control</em>, which stands for the proportional controller which is used throughout control theory. In this case, the control is to change <span class="math">$\Delta t$</span> in propertion to the current error ratio from the desired tolerance. If we let</p>
<p class="math">\[
q = \frac{\text{E}}{\max(u_k,u_{k+1}) \tau_r + \tau_a}
\]</p>
<p>where <span class="math">$\tau_r$</span> is the relative tolerance and <span class="math">$\tau_a$</span> is the absolute tolerance, then <span class="math">$q$</span> is the ratio of the current error to the current tolerance. If <span class="math">$q<1$</span>, then the error is less than the tolerance and the step is accepted, and vice versa for <span class="math">$q>1$</span>. In either case, we let <span class="math">$\Delta t_{new} = q \Delta t$</span> be the proportional update.</p>
<p>However, proportional error control has many known features that are undesirable. For example, it happens to work in a &quot;bang bang&quot; manner, meaning that it can drastically change its behavior from step to step. One step may multiply the step size by 10x, then the next by 2x. This is an issue because it effects the stability of the ODE solver method &#40;since the stability is not a property of a single step, but rather it&#39;s a property of the global behavior over time&#41;&#33; Thus to smooth it out, one can use a <em>PI-control</em>, which modifies the control factor by a history value, i.e. the error in one step in the past. This of course also means that one can utilize a PID-controller for time stepping. And there are many other techniques that can be used, but many of the most optimized codes tend to use a PI-control mechanism.</p>
<h2>Methodological Summary</h2>
<p>Here&#39;s a quick summary of the methodologies in a hierarchical sense:</p>
<ul>
<li><p>At the lowest level is the linear solve, either done by JFNK or &#40;sparse&#41; factorization. For large enough systems, this is brunt of the work. This is thus the piece to computationally optimize as much as possible, and parallelize. For sparse factorizations, this can be done with a distributed sparse library implementation. For JFNK, the efficiency is simply due to the efficiency of your ODE function <code>f</code>.</p>
</li>
<li><p>An optional level for JFNK is the preconditioning level, where preconditioners can be used to decrease the total number of iterations required for Krylov subspace methods like GMRES to converge, and thus reduce the total number of <code>f</code> calls.</p>
</li>
<li><p>At the nonlinear solver level, different Newton-like techniques are utilized to minimize the number of factorizations/linear solves required, and maximize the stability of the Newton method.</p>
</li>
<li><p>At the ODE solver level, more efficient integrators and adaptive methods for stiff ODEs are used to reduce the cost by affecting the linear solves. Most of these calculations are dominated by the linear solve portion when it&#39;s in the regime of large stiff systems. Jacobian reuse techniques, partial factorizations, and IMEX methods come into play as ways to reduce the cost per factorization and reduce the total number of factorizations.</p>
</li>
</ul>


        <HR/>
        <div class="footer">
          <p>
            Published from <a href="stiff_odes.jmd">stiff_odes.jmd</a>
            using <a href="http://github.com/JunoLab/Weave.jl">Weave.jl</a> v0.10.6 on 2020-10-14.
          </p>
        </div>
      </div>
    </div>
  </div>
</BODY>

</HTML>
